- Sundar Pichai thinks OpenAI might have breached YouTube's terms of service when it trained Sora.
- The ChatGPT-maker wowed the AI industry when it debuted its text-to-video model in February.
- OpenAI's CTO Mira Murati said she wasn't sure if Sora was trained on YouTube videos.
OpenAI might've breached YouTube's terms and conditions to train its text-to-video model Sora, says Google CEO Sundar Pichai.
"So you felt like they had broken your terms and conditions, or potentially, or if they had, that wouldn't have been appropriate?" Nilay Patel, the editor-in-chief of The Verge, asked Pichai in an interview published Monday.
"That's right. Yes, that's right," Pichai replied.
Earlier in the interview, Pichai revealed that YouTube was still "following up and trying to understand" how OpenAI had trained Sora.
"Look we don't know the details," Pichai said. "We have terms and conditions, and we would expect people to abide by those terms and conditions when you build a product, so that's how I felt about it."
In February, the ChatGPT-maker wowed the AI industry when it debuted Sora to the world. The model, which takes its name from the Japanese word for "sky," is capable of generating high quality videos with a simple text prompt.
But OpenAI has remained coy about the data it used to train coy. The company's CTO Mira Murati told The Wall Street Journal's Joanna Stern in March that it "used publicly available data and licensed data."
Murati, however, gave a far less definitive answer when Stern asked if OpenAI had taken data from platforms like YouTube and Instagram.
"I'm actually not sure about that," Murati replied. "You know, if they were publicly available to use, there might be data. But I'm not sure. I'm not confident about it."
Last month, YouTube CEO Neal Mohan told Bloomberg's Emily Chang that while he didn't know if OpenAI had trained Sora on YouTube videos, it would've been a "clear violation" of the platform's terms of use if they did.
"From a creator's perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service is going to be abided by," Mohan said.
"It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service," he continued. "Those are the rules of the road in terms of content on our platform."
Representatives for Google and OpenAI didn't immediately respond to requests for comment from BI sent outside regular business hours.
OpenAI's YouTube troubles underscore the challenges faced by data-hungry AI companies trying to train their models. In October, Amazon-backed AI startup Anthropic said that it was using data that it generated itself to train their models.
And this wouldn't be the only time OpenAI has courted controversy with how it works with content and creators.
On Monday, actress Scarlett Johansson said she was "shocked" and "angered" after OpenAI's brand new virtual assistant sounded "eerily similar" to hers.
Johansson said in a statement that she had turned down OpenAI CEO Sam Altman's offer to voice its latest GPT-4o model.
The model, which was released last week, included several voice options. Many social media users felt that one of voices, named "Sky," sounded like an AI chatbot that Johansson voiced in Spike Jonze's "Her." OpenAI said on Sunday that it was pausing "Sky's" release.
"We believe that AI voices should not deliberately mimic a celebrity's distinctive voice — Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice," OpenAI wrote in a blog post on the same day.